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Abstract 

We prove the first Chernoff-Hoeffding bounds for general (irreversible) finite-state Markov 
chains based on the standard L\ (variation distance) mixing-time of the chain. Specifically, 
consider an ergodic Markov chain M and a weight function / : [n] — > [0, 1] on the state space [n] 
of M with mean /j, = E t) «_^[/(i;)], where ir is the stationary distribution of M. A t-step random 
walk (v\, . . . , Vt) on M starting from the stationary distribution 7r has expected total weight 
ELY] = fit, where X = Y^\=i J ~i v i) ■ Let T be the L\ mixing-time of M. We show that the 
probability of X deviating from its mean by a multiplicative factor of <5, i.e., Pr [\X — /it\ > d/it], 
is at most exp(— £1 ((5 2 //t/T)) for < S < 1, and exp(— ft (S/j,t/T)) for S > 1. In fact, the bounds 
hold even if the weight functions /,'s for i £ [t] are distinct, provided that all of them have the 
same mean /i. 

We also obtain a simplified proof for the Chernoff-Hoeffding bounds based on the spectral 
expansion A of M, which is the square root of the second largest eigenvalue (in absolute value) 
of MM, where M is the time-reversal Markov chain of M. We show that the probability 
Pr [\X - fit\ > 5[it] is at most exp(-fi (<5 2 (1 - X)fMt)) for < S < 1, and exp(-fi (5(1 - A)/xt)) 
for S > 1. 

Both of our results extend to continuous time Markov chains, and to the case where the walk 
starts from an arbitrary distribution tp, at a price of a multiplicative factor depending on the 
distribution ip in the concentration bounds. 
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1 Introduction 

In this work, we establish large deviation bounds for random walks on general (irreversible) finite 
state Markov chains based on mixing properties of the chain in both discrete and continuous time 
settings. To introduce our results we focus on the discrete time setting, which we now describe. 

Let M be an ergodic Markov chain with finite state space V = [n] and stationary distribution 
7T. Let , vt) denote a t-step random walk on M starting from a distribution <p on V. For 

every i £ [t], let fi : V — )■ [0, 1] be a weight function at step i so that E v ^ n [fi(v)] = fi > for all i. 
Define the total weight of the walk (vx, . . . ,vt) by X = Yl\=i fi( v i)- The expected total weight of 
the random walk (v\, . . . , v t ) is E[jX] \i as t — > oo. 

When the Uj's are drawn independently according to the stationary distribution it, a standard 
Chernoff-Hoeffding bound says that 



However, when (v i, . . . , vt) is a random walk on a Markov chain M, it is known that the concentra- 
tion bounds depend inherently on the mixing properties of M, that is the speed at which a random 
walk converges toward its stationary distribution. 

Variants of Chernoff-Hoeffding bounds for random walk on Markov chains have been studied 
in several fields with various motivations [H [TO1 CEH [121 El CE1 [Jj. For instance, these bounds are 
linked to the performance of Markov chain Monte Carlo integration techniques [111 19]. They have 
also been applied to various online learning problem [15], testing properties of a given graph [6], 
leader election problems [TO] , analyzing the structure of the social networks [21 [13] , understanding 
the performance of data structures [3], and computational complexity [7]. Improving such bounds 
is therefore of general interest. 

We improve on previous work in two ways. First, all the existing deviation bounds, as far 
as we know, are based on the spectral expansion A(M) of the chain M. This spectral expansion 
A(M) characterizes how much M can stretch vectors in R n under a normed space defined by the 
stationary distribution tt, which coincides with the second largest absolute eigenvalue of M when 
M is reversible. (A formal definition is deferred to Section (21) The most general result for Markov 
chains in this form (see, e.g. [121 US]) is 



where tp is an arbitrary initial distribution and || • ||„- is the 7r-norm (which we define formally later). 

However, for general irreversible Markov chains, the spectral expansion A does not directly 
characterize the mixing time of a chain and thus may not be a suitable parameter for such bounds. 
A Markov chain M could mix rapidly, but have a spectral expansion A close to 1, in which case 
Eq. ([T]) does not yield meaningful bound. In fact there is a way to modify any given Markov chain 
M so that the modified Markov chain M' has (asymptotically) the same mixing-time as M, but 
the spectral expansion of M 1 equals 1 (Appendix lAl gives a detailed construction). It is therefore 
natural to seek a Chernoff-type bound for Markov chains directly parameterized by the chain's 
mixing time T. 

Second, most previous analyses for deviation bounds such as Eq. ([T]) are based on non- 
elementary methods such as perturbation theory [T2J [TTJ [T7] . Kahale [TO] and Healy [7] provided 
two elementary proofs for reversible chains, but their results yield weaker bounds than those in 
Eq. dH). Recently, Wagner [TO] provided another elementary proof for reversible chains matching 
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the form in Eq. ([I]). Together with the technique of "reversiblization" O [12], Wagner's analysis 
can be generalized to irreversible chains. However, his use of decoupling on the linear projections 
outright arguably leads to a loss of insight; here we provide an approach based on directly tracing 
the corresponding sequence of linear projections, in the spirit of [TJ. This more elementary approach 
allows us to tackle both reversible and irreversible chains in a unified manner that avoids the use 
of "reversiblization" . 

As we describe below, we prove a Chernoff-type bound for general irreversible Markov chains 
with general weight functions /, based on the standard L\ (variation distance) mixing time of the 
chain, using elementary techniques based on extending ideas from [7j. The exponents of our bounds 
are tight up to a constant factor. As far as we know, this is the first result that shows that the 
mixing time is sufficient to yield these types of concentration bounds for random walks on Markov 
chains. Along the way we provide a unified proof for ([1]) for both reversible and irreversible chains 
based only on elementary analysis. This proof may be of interest in its own right. 



2 Preliminaries 

Throughout this paper we shall refer M as the discrete time Markov chain under consideration. 
Depending on the context, M shall be interpreted as either the chain itself or the corresponding 
transition matrix (i.e. it is an n by n matrix such that Mi j represents the probability a walk at 
state i will move to state j in the next step). For the continuous time counterpart, we write A as 
the generator of the chain and let M(t) = e iA , which represents the transition probability matrix 
from to to to + t for an arbitrary to. 

Let u and w be two distributions over the state space V. The total variation distance between 

u and w is \\u - w\\tv = max^cv |X^eA M * ~~ 52ieA w i\ = l\\ u ~ Hli- 

Let e > 0. The mixing time of a discrete time Markov chain M is T(e) = min |i : max x ||xM* — 7r||ry < e}, 
where x is an arbitrary initial distribution. The mixing time of a continuous time Markov chain 
specified by the generator A is T(e) = min{t : max a . ||a;Af(t) — 7r||Ty < e}, where M(t) = e At . 

We next define an inner product space specified by the stationary distribution tt: 

Definition 2.1 (Inner product under 7r-kernel). Let M be an ergodic Markov chain with state space 
[n] and tt be its stationary distribution. Let u and v be two vectors in R n . The inner product under 
the vr-kernel is {u,v) n = £ x6[n] 

We may verify that (•, -) n indeed forms an inner product space by checking it is symmetric, linear 
in the first argument, and positive definite. The 7r-norm of a vector u in R n is \\u\\ n = \J (u, u) n . 
Note that 1 1 7r 1 1 ^ = 1. For a vector x G R n , we write = (x,7r) 7r 7r for its component along the 
direction of n and x^~ = x — x" for its component perpendicular to tt. 

We next define the spectral norm of a transition matrix. 

Definition 2.2 (Spectral norm). Let M the transition matrix of an ergodic Markov chain. Define 
the spectral norm of M as A(M) = max^ )7r ^ =0 • 

When M is clear from the context, we shall simply write A for A(M). We shall also refer 
1 — A(M) as the spectral gap of the chain M . In the case when M is reversible, A(M) coincides 
with the second largest eigenvalue of M (the largest eigenvalue of M is always 1). However, when 
M is irreversible, such relation does not hold (one hint to realize that the eigenvalues of M for 
an irreversible chain can be complex, and the notion of being the second largest may not even 
be well defined). Nevertheless, we can still connect A(M) with an eigenvalue of a matrix related 
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to M. Specifically, let M be the time reversal of M: M(x,y) 



reversiblization R(M) of M is R{M) = MM. The value of A(M) then coincides with the square 
root of the second largest eigenvalue of R(M), i.e. A(M) = W X(R(M)). Finally, notice that the 
stationary distribution of M, M, and R are all the same. These facts can be found in [3]. 

3 Chernoff-Hoeffding Bounds for Discrete Time Markov Chains 

We now present our main result formally. 

Theorem 3.1. Let M be an ergodic Markov chain with state space [n] and stationary distribution 
ir. Let T = T(e) be its e-mixing time for e < 1/8. Let (V\, . . . , Vt) denote a t-step random walk on 
M starting from an initial distribution (p on [n], i.e., V\ ip. For every i £ [t], let fi : [n] —> [0, 1] 
be a weight function at step i such that the expected weight E v ^ n [fi(v)] = fi for all i. Define the 
total weight of the walk (Vi, . . . , Vt) by X = X^i=i fi^Vi)- There exists some constant c (which is 
independent of fi, S and e) such that 



Before we continue our analysis, we remark on some aspects of the result. 

Optimality of the bound The bound given in Theorem l3.1l is optimal among all bounds based on 
the mixing time of the Markov chain, in the sense that for any given T and constant e, one can find 
a 5, a family of functions {fi : V — >■ [0, 1]}, and a Markov chain with mixing time T(e) = T that has 
deviation probabilities matching the exponents displayed in Theorem 13. 1\ up to a constant factor. 
In this regard, the form of our dependency on T is tight for constant e. For example, consider the 
following Markov chain: 

• The chain consists of 2 states s\ and S2- 

• At any time step, with probability p the random walk jumps to the other state and with 
probability 1 — p it stays in its current state, where p is determined below. 

• for all fi, we have /i(si) = 1 and /i(«2) = 0. 

Notice that the stationary distribution is uniform and T(e) = 0(l/p) when e is a constant. Thus, 
we shall set p = 0(1/T) so that the mixing-time T(e) = T. Let us consider a walk starting from 
s\ for sufficiently large length t. The probability that the walk stays entirely in si up to time 
t is (1 - pY w e~ tp = exp(-8(i/T)). In other words, for 5 = 1 we have Pr[X > (1 + S)fxt] = 
Pr[X > t] = Pr[the walk stays entirely in s\] = exp(— 0(i/T(e))). This matches the first bound in 
Theorem 13.11 asymptotically, up to a constant factor in the exponent. The second bound can be 
matched similarly by switching the values of fi(-) on s\ and S2- Finally, we remark that this example 
only works for e = 0(1), which is how mixing times appear in the usual contexts. It remains open, 
though, whether our bounds are still optimal when e = o(l). 

Dependency on the threshold e of the mixing time Note that the dependence of e only 
lies on T(e). Since T(e) is non-decreasing in e, it is obvious that e = 1/8 gives the best bound in 
the setting of Theorem 13.11 In fact, a more general form of our bound, as will be seen along our 
derivation later, replaces 1/72 in the exponent by a factor (1 — \/2e)/36. Hence the optimal choice 
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cM n exp(-S 2 t it/(72T)) 
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of e is the maximizer of (1 — y/2e)/T(e) (with e < 1/2), which differs for different Markov chains. 
Such formulation seems to offer incremental improvement and so we choose to focus on the form in 
Theorem 13.11 

Comparison with spectral expansion based Chernoff bound The bound given in Theo- 
rem [3J] is not always stronger than spectral expansion based Chernoff bounds ([TJ that is presented 
in, for example, Lezaud [12] and Wagner [16] . Consider, for instance, a random constant degree 
regular graph G. One can see that the spectral gap of the Markov chain induced by a random walk 
over G is a constant with high probability. On the other hand, the mixing time of the chain is at 
least fi(logn) because the diameter of a constant degree graph is at least f2(logn). Lezaud p2] or 
Wagner [16] gives us a concentration bound Pr[X > (1 + e)fit] < c\\ip\\ n exp f— Q(5 2 fj 1 t)) when 5 < 1 
while Theorem 13.11 gives us Pr[X > (1 + e)/ii] < c||y?|| T exp (— 0(5 2 /xt/(log n))) . 

Comparison with a union bound Assuming the spectral expansion based Chernoff bound in 
Lezaud |12] and Wagner |16| . there is a simpler analysis to yield a mixing time based bound in a sim- 
ilar but weaker form than Theorem l3.ll we first divide the random walk (Vi, Vt) into T(e) groups 
for a sufficiently small e such that the itb. group consists of the sub- walk Vi, V i+T ^, VJ +2 T(e)) The 
walk in each group is then governed by the Markov chain M T ( e > . This Markov chain has unit mixing 
time and as a result, its spectral expansion can be bounded by a constant (by using our Claim [3TT1 
below). Together with a union bound across different groups, we obtain 



Theorem 13.11 shaves off the extra leading factors of T in these inequalities, which has significant 
implications. For example, Eq. ([2]) requires the walk to be at least fi(TlogT), while our bounds 
address walk lengths between T and TlogT. Our tighter bound further can become important 
when we need a tighter polynomial tail bound. 

As a specific example, saving the factor of T becomes significant when we generalize these 
bounds to continuous-time chains using the discretization strategy in Fill [3] and Lezaud [12]. The 
strategy is to apply known discrete time bound on the discretized continuous time chain, say in a 
scale of b units of time, followed by taking limit as b — >• to yield the corresponding continuous 
time bound. Using this to obtain a continuous analog of Eq. ([2]) does not work, since under the 
6-scaled discretization the mixing time becomes T/b, which implies that the leading factor in Eq. 
([2]) goes to infinity in the limit as b — > 0. 

We now proceed to prove Theorem 13.11 

Proof, (of Theorem 13. ip We partition the walk Vi, ~.,Vt into T = T(e) subgroups so that the i-th 
sub-walk consists of the steps (Vi, V.+T, ■■■)■ These sub-walks can be viewed as generated from 
Markov chain N = M T . Also, denote .X"W = Y^o<j<t/T fi+jT(Vi+jT) as the total weight for each 
sub- walk and X = YlJ=i /T as the average total weight. 

Next, we follow Hoeffding's approach [8] to cope with the correlation among the XW. To start, 
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for S > 1 
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Now noting that exp(-) is a convex function, we use Jensen's inequality to obtain 

B[e ri l<E^' M ]. W 

i<T 

We shall focus on giving an upper bound on E[e r ^ W ]. This requires two steps: 

• First, we show the chain iV has a constant spectral gap based on the fact that it takes one 
step to mix. 

• Second, we appy a bound on the moment generating function of using its spectral 
expansion. 

Specifically, we shall prove the following claims, whose proofs will be deferred to the next two 
subsections. 

Claim 3.1. Let M be a general ergodic Markov chain with e-mixing timeT(e). We have X(M T ^) < 
V2e. 

Claim 3.2. Let M be an ergodic Markov chain with state space [n], stationary distribution tt, and 
spectral expansion A = A(M). Let (V\, . . . , Vt) denote a t-step random walk on M starting from an 
initial distribution tp on [n], i.e., V\ <— if. For every i G [t], let f. L : [n] — > [0, 1] be a weight function 
at step i such that the expected weight ~E v< _ n [fi(v)] = fi for all i. Define the total weight of the walk 
(Vi, . . . , Vt) by X = Y2i=i fi(Vi). There exists some constant c and a parameter r > that depends 
only on A and 5 such that 

E[e rX ] < Jc||^|| 7r exp(- ( 5 2 (1 - X) fit/ 36) for < 5 < 1 



e r(i+*)/rf - { c \\<p\\ n exp (-8(1 - X) fit /36) for 8 > 1. 
2 - e -I(i-^t ^ c||^|| w exp(-(5 2 (l-A) M t/36) for < 8 < 1. 

Claim [3711 gives a bound on the spectral expansion of each sub- walk X^\ utilizing the fact that 
they have unit mixing times. Claim is a spectral version of Chernoff bounds for Markov chains. 
As stated previously, while similar results exist, we provide our own elementary proof of claim [3721 
both for completeness and because it may be of independent interest. 

We now continue the proof assuming these two claims. Using Claim [37T1 we know X(N) < \. 
Next, by Claim [372l for the i-th sub-walk, we have 

E[e rX<!) ] < fc||v3M l || 7r exp(-(5 2 /it/(72r)) for < 8 < 1 ^ 



e r{i+S)tit/T - | c |) (/ ,M 4 || 7r exp(-^/(72T)) for 8 > 1 

for an appropriately chosen r (which depends only on A and 5 and hence the same for all i). Note 
that M l arises because X^ starts from the distribution ipM i . On the other hand, notice that 
\\ipM l \\l = W^M^l + W^M^l < \\ip\\\\l + X 2 (M i )\\(p ± \\l < \\tp\\l (by using Lemma[33D, or in 
other words [|y)M*[| w < \\ip\\n- Together with (|3|) and (dJ), we obtain 



Pr[X >(! + 6)fd] < 



c\\ip\\ JI exp(-8 2 fit/(72T)) for < 8 < 1 
c\\cp\\ w exp(-8fit/(72T)) for 8 > 1 



This proves the first half of the theorem. The second case can be proved in a similar manner, 
namely that 



Pr[X < (1-6) tit] = Pr 



X < 



(1 - 6) fit 



k=l 



again by Jensen's inequality applied to exp(-). 

□ 
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3.1 Mixing Time v.s. Spectral Expansion 

In this subsection we prove Claim 13.11 We remark that Sinclair [TJ] presents a similar result for 
reversible Markov chains: for every parameter e 6 (0, 1), 

1 MM) 1 m/ , 

where T(e) is the e-mixing-time of M. However, in general it is impossible to get a bound on 
A(M) based on mixing time information for general irreversible chains because a chain M can have 
A(M) = 1 but the e-mixing-time of M is, say, T(e) = 2 for some constant e (and A(M 2 ) <C 1). 

In light of this issue, our proof of Claim [37X1 depends crucially on the fact that M T ^ has mixing 
time 1, which, as we shall see, translates to a bound on its spectral expansion that holds regardless 
of reversibility. We need the following result on reversible Makrov chains, which is stronger result 
than Eq. (JH) from [14] . 

Lemma 3.2. Let < e < 1/2 be a parameter. Let M be an ergodic reversible Markov chain with 
e-mixing time T(e) and spectral expansion A(M). It holds that A(M) < {2e) 1 / T ^ . 

We remark that it appears possible to prove Lemma 13.21 by adopting an analysis similar to 
Aldous' [lj, who addressed the continuous time case. We present an alternative proof that is 
arguably simpler; in particular, our proof does not use the spectral representation theorem as used 
in [1] and does not involve arguments that take the number of steps to infinity. 

Proof, (of Lemma [3. 2 p Recall that for an ergodic reversible Markov chain M, it holds that X(M t ) = 
A*(M) for every t £ N. Hence, it suffices to show that \{M T ^) < 2e. Also, recall that A(M T W) 
is simply the second largest eigenvalue (in absolute value) of M T ^\ Let v be the corresponding 
eigenvector, i.e. v satisfies vM T ^ = X(M T ^)v. Since M is reversible, the entries of v are real- 
valued. Also, notice that v is a left eigenvector of M while (1,1,...,1) T is a right eigenvector of M 
(using the fact that each row of M sums to one). Furthermore, v and (1, \) T do not share the 
same eigenvalue. So we have (v, (1, 1) T ) = , i.e. Yli v i = 0- Therefore, by scaling v, we can 
assume w.l.o.g. that x = v + tt is a distribution. We have the following claim. 

Claim 3.3. Let x be an arbitrary initial distribution. Let M be an ergodic Markov chain with 
stationary distribution tt and mixing time T(e). We have \\xM T ^ — tt\\tv ^ 2e||x — vt||tv- 

Proof, (of Claim I3.3|) The inequality holds trivially when x = tt. Let x ^ tt be an arbitrary 
distribution on M, 8 = \\x — tt\\tv > 0, and y = x — tt. We decompose y into a positive component 
y + and a negative component y~ by 

+ fyi ify*>0 _ Jo if z/i > 

I I) o.w. I — yi o.w. 

Note that by definition, J2iVt = Y^iVl = We define z + = y + /S and z~ = y~ /5. Observe that 
z + and z~ are distributions. By the definition of e-mixing time, we have 

\\ Z +M T ^ - tt\\ tv < e, and ||z _ M T ( £ ) - tt\\ tv < e, 
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or equivalently, ||z + M T ^ e ^ — 7r||i < 2e and \\z M T ^ — tt\\i < 2e. Now, we are ready to bound the 
statistical distance ||xM T ( £ ) — 7r as follows. 

\\xM T ^ - ir\\ TV = (l/2)||xA^' r(e) - vr||i 

= (l/2)||0i; -7r)M T ( 6 > ||i 

= (l/2)\\(y+-y-)M T ^\\ 1 

= (l/2)\\5z + M T{£) -fc-M T(e) ||i 

= (5/2)\\(z + M T ^ - tt) - (z-M T ^ - 7r)||i 

< (5/2) ( \\(z + M T ^ - tt)||i + ||(z"M T ( £ ) - 7r)||0 < 2e5. 



□ 

We now continue to prove Lemma 13.21 By Claim [3T3l ||o;M T ( £ ) — 7r||Ty < 2e||x — 7t||ty, i-e. 
||xM T ( £ ) — 7r||i < 2e||x — -7T 1 1 1 . Observing that (xM T ^ — tt) and (x — tt) are simply \(M T ^)v and v, 
the above inequality means A(M T ( £ ))||?;||i < 2e||u||i, which implies X(M T ^) < 2e, as desired. □ 

We are now ready to prove our main claim. 

Proof, (of Claim 13. 1 j) The idea is to reduce to the reversible case by considering the reversiblization 
of M T ( £ ). Let M T ( £ ) be the time reversal of M T{ - £ \ and R = M T{ -^M T ^ be the reversiblization 
of M T(£ \ By Claim EH \(M T ^) = y/\(R). Let us recall (from Section EJ) that M, M T( - £ \ 
and M T ^ all share the same stationary distribution tt. Next, we claim that the e-mixing-time 
of R is 1. This is because \\ipM T & M T ^ - tt\\ T v < \\ipM T( -^ - tt\\tv < e, where the second 
inequality uses the definition of T(e) and the first inequality holds since any Markov transition is 
a contraction mapping: for any Markov transition, say S = (s(i,j)), and any vector x, \\xS\\i = 
Yjj I Y^i x i s ih j)\ < \ x i\ s (hj) = J2i \ x i\ = \\ x hl putting x = ipM T( - £ "> - tt and S = M T ^ 

gives the first inequality. Now, by Lemma [3721 X(R) < 2e, and hence \(M T( - £ ^) = \J X(R) < \/2e, 
as desired. □ 



3.2 Bounding the Moment Generating Function 

We now prove Claim [3721 We focus on the first inequality in the claim; the derivation of the second 
inequality is similar and is deferred to Appendix iBl 

Claim 13.21 leads directly to a spectral version of the Chernoff bound for Markov chains. 
Lezaud [12] and Wagner [16] give similar results for the case where fi are the same for all i. 
The analysis of |16] in particular can be extended to the case where the functions fi are different. 
Here we present an alternative analysis and along the way will discuss the merit of our approach 
compared to the previous proofs. 

Recall that we define X = Yul=i fi(yi)- We start with the following observation, which has 
been used previously [TJ H2J HE] . 

E[e rX ] = \\<pP 1 MP 2 ...MP t \\ 1 , (7) 

where the Pi are diagonal matrices with diagonal entries (Pi)j,j — e r ^' for j £ [n]. One can verify 
this fact by observing that each walk Vx,...,Vt is assigned the corresponding probability in the 
product of M's with the appropriate weight e r ^ i ^ i<yVi \ 

For ease of exposition, let us assume P, are all the same at this moment. Let P = P\ = 
... = P t , then © becomes \\^{P Mf~ l P\\ x = (^(PMf^P, tt}^ = (p(PM)*, tt}^ = ||p(PM)*[|i (see 
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Lemma 13.31 below). Up to this point, our analysis is similar to previous work Q21 [TJ [16] . Now 
there are two natural possible ways of bounding \\(p(PM) ||i = (</?(PM)*, tv)^. 

• Approach 1. Bounding the spectral norm of the matrix PM. In this approach, we 
observe that (^(PM) ! , 71")^ < ||<^|| 7r ||i- ) M||^ r where HPM^ is the operator norm of the matrix 
PM induced by || • (see, for example, the proof of Theorem 1 in [16J). This method decouples 
the effect of each PM as well as the initial distribution. When M is reversible, ||PM can be 
bounded through Kato's spectral perturbation theory p3 H21 HI] • Alternatively, Wagner [16J 
tackles the variational description of HP-MH^ directly, using only elementary techniques, whose 
analysis can be generalized to irreversible chains. 

• Approach 2. Inductively giving a bound for x(PM) 1 for all i < t. In this approach, 
we do not decouple the product ip{PM) t . Instead, we trace the change of the vector ip(PM) 1 
for each i < t. As far as we know, only Healy [Tj adopts this approach and his analysis is 
restricted to regular graphs, where the stationary distribution is uniform. His analysis also 
does not require perturbation theory. 

Our proof here generalizes the second approach to any ergodic chains by only using elementary 
methods. We believe this analysis is more straightforward for the following reasons. First, directly 
tracing the change of the vector c/?(PM)* for each step keeps the geometric insight that would 
otherwise be lost in the decoupling analysis as in [HI [16]. Second, our analysis studies both the 
reversible and irreversible chains in a unified manner. We do not use the reversiblization technique 
to address the case for irreversible chains. While the reversiblization technique is a powerful tool 
to translate an irreversible Markov chain problem into a reversible chain problem, this technique 
operates in a blackbox manner; proofs based on this technique do not enable us to directly measure 
the effect of the operator PM. 

We now continue our analysis by using a framework similar to the one presented by Healy [7J. 
We remind the reader that we no longer assume Pj's are the same. Also, recall that E[e r ^] = 
\\tpPx MP 2 ...MP t ||i = (ipP 1 MP 2 ...MP t ,ir} n = \\(ipP 1 MP 2 ...MP t )\\\\ 7r . Let us briefly review the strat- 
egy from [7J. 

• First, we observe that an arbitrary vector x in 1" can be decomposed into its parallel compo- 
nent (with respect to tt) x" = (x,7r)ir and the perpendicular component x^ = x — x" in the L n 
space. This decomposition helps tracing the difference (in terms of the norm) between each 
pair of {pP\M...PiM and ipP\M...Pi+\M for i < t, i.e. two consecutive steps of the random 
walk. For this purpose, we need to understand the effects of the linear operators M and Pj 
when they are applied to an arbitrary vector. 

• Second, after we compute the difference between each pair xP\M...PiM and xP\M...Pi + iM , 
we set up a recursive relation, the solution of which yields the Chernoff bound. 

We now follow this step step framework to prove Claim 13.21 

The effects of the M and p operators Our way of tracing the vector LpP\MP 2 ...MPt relies 
on the following two lemmas. 

Lemma 3.3. (The effect of the M operator) Let M be an ergodic Markov chain with state 
space [n], stationary distribution it, and spectral expansion A = A(M). Then 

1. TtM = IT. 

2. For every vector y with y-Ln, we have yM^ir and \\yM\\ n < X\\y\\ n . 

Lemma 3.4. (The effect of the P operator) Let M be an ergodic Markov chain with state space 
[n] and stationary distribution tt. Let f : [n] — > [0,1] be a weight function with E^ < _ 7r [/(u)] = fj>. 
Let P be a diagonal matrix with diagonal entries Pjj = e r ^' for j £ [n], where r is a parameter 
satisfying < r < 1/2. Then 
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1. \\(vPfh < l + (e r -l)/i. 

2. \\{nP) L h < 2r ^- 

3. For every vector y-Lir, ||(yP)" < 2ry / 7i||y|| 7r . 
4- For every vector yl.Tr, ||(yP) < e r \\y\\^ 

Items 1 and 4 of Lemma 13.41 state that P can stretch both the perpendicular and parallel 
components along their original directions moderately. Specifically, a parallel vector is stretched by 
at most a factor of (1 + (e r — ~ 1 + 0(rfi) and a perpendicular vector is stretched by a factor 
of at most e r ~ 1 + 0(r). (Recall r will be small.) On the other hand, items 2 and 3 of the lemma 
state that P can create a new perpendicular component from a parallel component and vice versa, 
but the new component is of a much smaller size compared to the original component (i.e. only of 
length at most 2r^JJi times the original component). 

Remark We note that the key improvement of our analysis over that of Healy [7] stems from items 
2 and 3 of Lemma 13.41 Healy [7] proved a bound with a factor of (e r — l)/2 = 0(r) for both items 
for the special case of undirected and regular graphs. Our quantitative improvement to 0{ryfji) 
(which is tight) is the key for us to prove a multiplicative Chernoff bound without any restriction 
on the spectral expansion of M. 

Note that Lemma 13.31 is immediate from the definitions of tt and A. We focus on the proof of 
Lemma 13. 



Proof, (of Lemma 13. 4p . For the first item, note that by definition, ||(7rP)'' = (irP,ir) n = 
^\ e r /( l ) 7ri- "We simplify the sum using the fact that e rx < 1 + (e r — l)x when r, x G [0, 1]. 

|| (ttP)II |U = £ e^TTi < £(1 + (el 1)/«)tt 4 = + (el 1) ]T /(i)^ = 1 + (e r - 1) M , 

i i i i 

where the last equality uses the fact that ^ 7Tj = 1, and Yli fi^i = ^v<-n [f( v )] = 
For the second item, by the Pythagorean theorem , we have 

IKvr^ll^ = Hvr.PlI^ - 11(^)1111^ = ^6^^)^ _ ^X;^ Ci) ^ • 

Recall that r < 1/2 and f(i) < 1, and therefore 2rf(i) < 1. Using the fact that 1 + x < e x < 
1 + x + x 2 when x G [0, 1], we have 

< 1 + 2r/i + 4r 2 ^ - (1 + r/i) 2 

= 1 + 2r^, + 4r 2 fi - (1 + 2r^ + r 2 /i 2 ) < 4r 2 /i, 

The second inequality uses the fact that J2i / 2 (^) 7r (^) — X^/W 71 "^) = A 4 (since < f(i) < 1). It 
follows that IKvrP)- 1 !^ < v / 4r 2 /i = 2r^//L 

For the third item, by definition, ||(yP)" = (yP, vr)^. Since P is diagonal, we have (yP, Tr) n = 
(y,7rP) n . By definition, yln means {y,Tr) v = 0. Therefore, ||(yP)"||7r = {y^P)n — (y^)^ = 
(y, 7r(P — I)) w - By the Cauchy-Schwarz Inequality, we have (y,ir(P — I)) n < \\y\\ w \\Tr(P — 
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We proceed to upper bound \\ir(P — I)\\n- 

MP -I)\\l = J2(*i(e rm - 1)) 2 M = E( £r/W - !) 2 ^- 

i i 

Using e rx < 1 + (e r - l)x for r,x £ [0, 1], we have E^e^W - 1) 2 tt; < + (e r - l)/(i) - 1) 2 tt; = 
^j(e r — l) 2 / 2 (i)7Tj < (2r) 2 ^ f(i)^i < (2^) 2 /^, where the second-to-last inequality uses the fact that 
e r _ i < 2r for r € [0,1] and < /(«) < 1. Therefore, \\{yP^ || w < [|tt(P - I)IUI|l/lk < 2r v //Z||?/|| 7r . 
Finally, for the fourth item, we have 

„2„2r/(i) „2„2r 

iifoprt < iii/piis = E — — * E — = e2r \\y\\l> 

which implies IKyP) -1 "!! < 1 1 2/ 1 1 t,- . □ 



Recursive analysis We now provide a recursive analysis for the terms xP\M...MPi for i < t 
based on our understanding of the effects from the linear operators M and P{. This completes the 
proof for Claim 13.21 

Proof, (of Claim [3T2]) . First, recall that 

E[e rX ] = \\(< P P 1 MP 2 ...MP t y\\\i r = \\{i P P 1 MP 2 ...MP t Mf\ 

where the second equality comes from Lemma 13.31 Our choice of r is r = min{l/2, log(l/A)/2, 1 — 
y/X, (1 - A)<5/18}. We shall explain how we make such a choice as we walk through our analysis. 

We now trace the 7r-norm of both parallel and perpendicular components of the random walk 
for each application of P{M. Let zq = 92 and Zi = Zi-\PiM for i G [t]. By triangle inequality and 
Lemma 13.31 and 13.41 for every i G [t], 




\\zX = \\(zi-iPiM)\% = \\((zl 1 + zt 1 )P i My% < \\(zl 1 P l M)tt\\ w + \\(ztxPiM)\% 

< (1 + (e r - 1)a*) + (2ry70 H^ilU, 

and similarly, 

ll^lk < UzhPM^W^ + Wizt-tPiM)^ < (2rA^) II^Ljk + (e r A) H^ilk 

< (2rA v ^)||zf_ 1 ||^ + v / A||2;^ 1 || 7r , 

where the last inequality holds when r < (1/2) log (1/ A) i.e. e r < l/\/A. The reason to require 
r < (1/2) log(l/A) is that we can guarantee the perpendicular component is shrinking (by a factor 
of VX < 1) after each step. 

Now let ao = \\zqWk = 1 and ft) = II^IItt, and define for i G [i], 

«i = (1 + (e r - l)fi) ai^i + (2r^u) ft_i and ft = (2rA v / /I) a;_i + VAft-i. 

One can prove by induction easily that \\zj\\ w < a» and H^lk < ft for every i G [t], and a^'s are 

strictly increasing. Therefore, bounding the moment generating function E[e r ^] = \\zf \\ w < at boils 
down to bounding the recurrence relation for on and ft. 
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Observe that in the recurrence relation, only the coefficient (1 + (e r — l)/x) > 1 while the 
remaining coefficients (2r^/JI), (2r\y/]T), and \f\ are all less than 1 if r is chosen sufficiently small. 
This suggests, intuitively, a^s terms will eventually dominate. This provides us a guide to reduce 
the recurrence relation to a single variable as follows. 

First let us give an upper bound for 

Claim 3.4. For every *€[*],&< 2r (E}=o \A j+2 A*) a i-l + V^A)- 

Proof, of Claim 13. 41 The lemma follows by expanding the recurrence relation and using the fact 
that ctj's are increasing, i.e. 

' i-i 



A = 2rA % /M^-i+V / AA-i = 2rAv7ia 4 -i+V / A2rA % /Mai-2+V / A2A-2 = ■ ■ • = 2r \A j+2 /^-i-i J +^^0 

Finally, by using the fact that a, are strictly increasing, we complete the proof. □ 

We can then bound a« by substituting /3j_i using Claim l3~4l 
Claim 3.5. a% < (1 + (e r — + 2ry / ^Z/3o, a^rf /or ewery 2 < i < t, 

a t < f 1 + (e r - + 4r 2 ^ ^ v^ 2 ^ j <*-i + 2r V^V/Sq- 

Proof. The case of i = 1 is trivial. For 2 < i < t, this follows by applying the recurrence relation, 
Claim [3741 and the fact that Oj_2 < cti-i- 



(l + (e r -l) M )a l _i + (2r v ^I)A 



i-2 



< (1 + (e r - + (2^Vm) I 2r I £ V^'+ 2 M ] <^-2 + VA^A, 

< f 1 + (e r - + Ar 2 ^Jl ^ VA^'+vj j + 2r y /\ i - 1 f ip 

□ 

For notational simplicity, let = 1 + (e r — 1) and for 1 < i < t, let 



^ = 1+ (e r - l)/x + 4r 2 VM E v/^+V 



Claim E3] then can be expressed as < A\ai-\ + 2r^//Imin{V A* -1 , l}/?o, for every i £ [t]. By 
expanding iteratively, we obtain 



at < A t (A t -i(- ■ ■ (MMM + 2»VmA>) + 2r^lf3 Q ) + 2r A /AVA)) ■■■)+ 2ry/\ t ^ f i(3 ) + 2tVA'-VA> 
= (Af- Ax) + (A t ■ ■ ■ A 2 (2r^ )) + (Af- A 3 (2r^/3 )) + ■■■+ A t {2r^\^^ ) + 2r^/>F^p 



< (l + 2rVM/?o + 2rVV/3o + 2r^l(3 Q + ■■■ 2r^\ t - 1 ^ \J[ A^j 



< ( 1 

~ V 1- Va 
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where the last inequality uses the fact that 1/(1 - y/X) < 2/(1 - A) for A G [0, 1). It remains to 
upper bound Y\ i A{. Using (1 + x) < e x , we have 

t ( t ( / i-2 

I]> < exp \ (e r - 1)M + E ( e '' - + ^ ^ E 

i=l [ i=2 \ \j=0 

The first two sums in the exponent lead to Y^i( er ~ !)/•*< = ( er ~~ 1 )/•*£• • We now bound the last 
sum in the exponent, which can be viewed as an "error" term due to the correlation between each 
step of the random walk. 

E 4r 2 ^E v/A^ < 4rVEE^ J = 4r VE^ J ^ f=T 

i=2 j=0 i=l j=0 j=0 



where last inequality uses 

Ej=o < 1/(1 - y/X) < 2/(1 - A). Putting things together, we have 
Yl Ai < exp |(e r - + ^rf} = ex P 



( gr - + ) >" ^ • 



and recalling that H^'Htt = 1 and /3q = ||</? J 



E[- M ^MIl^ 2n^{l- ^} IMU«P { (V " 1) + (T^TT ) /'/ 



Recall that our goal is to choose an r to bound ~E[e rX ]/e r ( 1+s ^ t . Choosing r = min{l/2, log(l/A)/2, 1- 
y/X, (1 - A)<5/18} = (1 - X)S/18, we complete the proof of Claim [3J3 □ 

Before completing this subsection, we make a final remark. Our proof also works even for the 
case E,r[/i(i;)] are different for different values of i, which results in a more general Chernoff type 
bound based on spectral expansions. This more general result, as far as we know, has not been 
noted in existing literatures with the exception of Healy [7], who gave a Chernoff bound of this 
kind with stronger assumptions for regular graphs, although the analysis given by Lezaud [12] or 
Wagner [16] also appears to be generalizable as well. On the other hand, this strengthened result 
of Claim [3T2l does not seem to be sufficient to remove the requirement that E„-[/j(u)] are the same 
for Theorem 13.11 

3.3 Continuous Time Case 

We now generalize our main result to cover the continuous time chains. The analysis is similar to 
the one presented by Lezaud [12] and will be deferred to Appendix O 

Theorem 3.5. Let A be the generator of an ergodic continuous time Markov chain with state 
space [n] and mixing time T = T(e). Let {vt : t £ R + } be a random walk on the chain starting 
from an initial distribution tp such that vt represents the state where the walk stay at time t. Let 
{ft '■ [ n ] — > [0, 1] | t G R- + } be a family of functions such that fi = E v< - n [ft(v)] for all t. Define the 
weight over the walk {v s : s G R + } up to time t by X t = L f s (v s )ds. There exists a constant c such 
that 



1. PrLY > (1 + 5) fit] < 



c\\ip\\ n exp(-6 2 fj,t/(72T)) for < 5 < 1 
c\\<p\\ 7r exp(-5nt/{72T)) for 5 > 1 

2. Pr[X < (1 - 6) fit] < c\\tp\\ 7r exp(~S 2 ut/(72T)) for < 5 < 1 
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A Construction of Mixing Markov Chain with No Spectral Ex- 
pansion 

In this section, we show that any ergodic Markov chain M with mixing time T = T(l/4) can be 
modified to a chain M' such that M 1 has mixing time 0(T) but spectral expansion A(M') = 1. 

Our modification is based on the following simple observation. Let M' be an ergodic Markov 
chain with stationary distribution ir' . If there exist two states v and v' such that (i) M' vv , = 1, 
i.e., state v leaves to state v' with probability 1, and (ii) M' u v , = for all u ^ v , i.e., the only 
state transits to v' is v, then A(M') = 1: Note that in this case, ir'(v) = ir'(v') since all probability 
mass from v leaves to v', which receives probability mass only from v. Consider a distribution x 
whose probability mass all concentrates at = 1 and x u = for all u ^ v. One step walk 

from x results in the distribution xM 1 whose probability mass all concentrates at v' . By definition, 
\\x\\ n/ = WxM'Wrf and thus X(M') = 1. 

Now, let M be an ergodic Markov chain with mixing time T = T(l/4) and stationary distribu- 
tion 7r. We shall modify M to a Markov chain M' that preserves the mixing-time and satisfies the 
above property. We mention that it is not hard to modify M to satisfy the above property. The 
challenge is to do so while preserving the mixing-time. Our construction is as follows. 

• For every state v in M, we "split" it into three states (v,in), (v,mid), (v,out) in M'. 

• For every state (v,in) in M' we set Mt . w • \ = Mi . w .„ = 1/2, i.e., (v,in) stays in 

J V ' ' ' (v ,'in) ,(v ,in) (v ,m) ,(v ,mtd) ' ' ' V ' / J 

the same state with probability 1/2 and transits to (v,mid) with probability 1/2. 

• For 6V6ry state Tnid^ in ikf , W6 set -M^ v ^ v ou {^ 

= 1, i.e., (v,mid) always leaves to 

(v, out). 

• For every pairs of states u, v in M, we set the transition probability Mt out ^ , from (u, out) 
to (v,in) to be M u<v . 

It is not hard to verify that the modified chain M' is well-defined, ergodic, and satisfies the 
aforementioned property (namely, (y, mid) leaves to (v, out) with probability 1 and is the only state 
that transits to (y, out)). It remains to show that M' has mixing-time 0(T). Toward this goal, let 
us define yet another Markov chain C that consists of three states {in, mid, out} with transition 
probability C in ^ n = C intmid = 1/2, and C mi d,out = C outt i n = 1. Clearly, C is ergodic and has 
constant mixing-time. Now, the key observation is that a random walk on M' can be decomposed 
into walks on M and C in the following sense: every step on M 1 corresponding to a step on C in a 
natural way, and one step on M' from (u, out) to (v, in) can be identified as a step from u to v in 
M. Note that the walks on M and C are independent, and in expectation, every 4 steps of walk 
on M' induce one step of walk on M. It is not hard to see from these observation that the mixing 
time of M' is at most 8T. 

B The Bound When the Sum Is Less Than Mean 

We now prove the remaining part of Claim 13.21 i.e. 

Claim B.l. Let M be an ergodic Markov chain with state space [n], stationary distribution it, and 
spectral expansion A = A(M). Let {V\, . . . , Vt) denote a t-step random walk on M starting from an 
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initial distribution ip on [n], i.e., V\ ^— (p. For every i £ [t], let fi : [n] — > [0, 1] be a weight function 
at step i such that the expected weight E„ < _ 7r [/j(f )] = fi for all i. Define the total weight of the walk 
(Vi, . . . , Vt) by X = Y^i=i fi(Yi). There exists some constant c and a parameter r > that depends 
only on A and 5 such that 

2 - e -r(l-S)lt ^ c|Mkexp(-<? 2 (l-A)/xt/36) for0<5<l. 

We mimic the proof strategy presented in Section 13.21 Observe first that 

E[e~ rX ] = \\xP 1 MP 2 ...MP t \\ 1 , 

where Pj's are diagonal matrices with diagonal entries {P%)j,j — e~ r ^' for j G [n]. Thus, our goal 
is to bound the moment generating function E[e r ^]. 

Similar to the analysis presented in Section 13.21 we need to understand the effect of the Pi 
operators. 

Lemma B.l. Let M be an ergodic Markov chain with state space [n] and stationary distribution 
ir. Let f : [n] — > [0, 1] be a weight function with E v <- n [f(v)] = [i. Let P be a diagonal matrix with 
diagonal entries Pjj = e~ r f(i) for j G [n], where r is a parameter satisfying < r < 1/2. We have 

• IKtt-P)" Ik <l — rfi + ^fj,. 

• || (vP) 1 - Ik < V2r^Jl 

• For every vector y-Ln, ||(y.P)"|k — r \/^llylk- 

• For every vector yJ-ir, IKy-P^lk < \\y\\ w 



Proof. For the first item, we have 

IK^'lk = E 



i<n 



r 2 



< £(1 - r/(i) + -f(i))% 



i<n 

< i-r(j,+ —(i 



The first inequality holds because e rx < 1 — rx + r 2 x/2 for < x < 1. 
(2). we may use Pythagorean theorem and get 

WinP^Wl = \\{nP)\\l-\\{nP)Hl 



E^ /(i) ^- E 

i<n \ i<n 



< e i 1 - 2r /« + 2 ^ 2 / 2 w) - E^ 1 - r /WK< 

i<n 

o 2 2 2 

= 2r jx — r (i 

< 2r 2 /i. 



This implies IKirP) 1 ^ < V^ryTZ. 
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(3). First, since y-Lvr, we have (y,vr) 7r = 0. Next notice that by Cauchy Schwarz inequality, 
W(yP)H* = (y^P)n - (y,Kl)n = (y,7r(P-I)) w < ||y|k||7r(P-/)|| w . 
We next bound \\n(P — Specifically, 

h(p-i)\\i = i>- r/w - ^ 

i<n 

= Ed 

i<n 

< J>/(*)) 2 vr, 

i<n 



1 % 

i< 



< r 2 [i. 

Therefore, ||(yP)H < r^/JL\y\„. 

(4). We have \\(yP) x h < \\(yP)\\* < Wvh- □ 

Now we proceed to prove Claim |B~T1 using Lemma IB. 11 
Proof, (of Claim lB~Tj) . Let us recall that zq = x and zi = Zi_\PiM for i G [t]. Lemma IB. II gives us 

1 1 

114' Ik < (l-rM+y^ll^lk + r-V^II^-ilk 

and 

ll^lk < V^Ar-^tllzl^H^ + All^xll^ 

Following our strategy presented in Section [372], let ao = ||^o [|tt = 1 an d ft) = 1 1 1 1 ?r and define for 
each i 6 [t], 

ai = (1 - r/i + / ur 2 /2)a i _i + r^ft-i (8) 

and 

ft = (v^rAyTOai-i + Aft-i- (9) 

We can inductively show that < a« and H-z^Htt < ft for each i G [t]. 

Our goal becomes to give an upper bound for a, and ft. Also, we shall set r = min{l/2, log(l/A)/2, 1- 
\/A, (1 - A) 5/8} throughout our analysis. Next, we recursively substitute the value of ft from Eq.([9]) 
into Eq.flB]) and yield, 



(1 - (r - r 2 /2)^)a 4 _i + V^/iAa^ + ... + V^/iA* -1 ^ + r^A^ft, (10) 



Using the fact that r < 1 — vA and thus a» < (1 — (r — r 2 /2)//)aj_i for all z > 1, we may conclude 
V\oii-i < ctj. Now (flOl) becomes 



i-1 



a 



< [ 1 - ( r - r 2 /2)^ + ^r 2 ^ [ Yl v^^ 7 ] ] + ^A^ft. (11) 

,i =1 



17 



Next, let us define Ai as follows, 

Ai = ^1 - (r - r 2 /2)/x + v^r 2 ^//!, ^ ^JlV¥- 

We then have 

«i < AiOLi-i + r^JIX l ~ 1 f3 . 

Therefore, we can see that 



On the other hand, we can see that 



*(n*)(i + *^). 



JjAi < exp^(-(r-r 2 /2)^)+ £ v 7 ^ 2 ^ £ 



< exp|-(r-r 2 /2)/it + y^-^j 
f /r 2 2 v / 2r 2 \ 

= exp i- r/i * + T + T^Xr 4 * 



£ cxp <j - /'/// + ( ^ - fit 



Notice that l + /3 ^ = O (^^T ) ■ B ^ usin § the fact r = min{l/2, log(l/A) /2, 1 - VA, (1 - A)<5/8}, 
we complete the proof. □ 



C Continuous Time Case 

This section proves Theorem 13.51 

Proof, (of Theorem 13. 5 p . We mimic the strategy from Lezaud [12] to discretize the chain in b time 
units, i.e. consider the states vib for i = 0, 1, ...,t/b. The stationary distribution of this discretized 
chain Vib is the same as the original continuous time chain, and hence \i = E n f t (v t ) = E^/j^t^). 
Now by Theorem 13.11 we have 



1. Pr 



2. Pr 



t/b 



X) /»("») > + 



8=1 



i=l 



£/„(^)<(i-5)^ 



< 



< 



'c|M|^exp (-5 2 Li(t/b)/(72T/b)) for < 5 < 1 
c[M| ff exp (-fy(i/&)/(72T/&)) for 5 > 1 

||(^|| 7r exp(-5 2 /i(t/6)/(72T/6)) for < 5 < 1 



Notice that the mixing time for the discretized chain is T/b while the total number of steps here 
is t/b. In the exponents, the term b appears in both the numerator and the denominator and they 
cancel with each other. Taking limit as b — > completes the proof |12j . □ 



